Acknowledging

Rladies Melbourne for the invitation to speak

Rladies community - welcoming and inclusive

background-image: url(“https://media.giphy.com/media/f4FKTFwMXn1za/giphy.gif”) background-position: 50% 50% class: center, inverse

Second goal of this talk!

Inspired and confident to dive in!


Assumptions

In the beginning, there was LaTeX

In the early 1980’s, LaTex was released.
Latex is a document preparation system.
Plain text + markup = defined structure (article, letter, bibliography)

Perfect for those who really care about typeface:

Kerning is the process of selectively adjusting the spacing between letters pairs to improve the overall appearance of text.


In the beginning, there was Sweave

Early 2000’s, along came Sweave.

A function that enables integration of R code into LaTex documents

The purpose is “to create dynamic reports, which can be updated automatically if data or analysis change”. [1]

  1. Run each R script
  2. Then run latex
  3. then run bibtex
  4. then run latex again
  5. generates a pdf

.footnote[ [1] Leisch, Friedrich (2002). “Sweave, Part I: Mixing R and LaTeX: A short introduction to the Sweave file format and corresponding R functions” (PDF). R News. 2 (3): 28–31.]


In the beginning, there was GNU

GNU 1. Create a makefile 2. type make

See Karl Broman (https://kbroman.org/minimal_make/) for a fantastic tutorial.


Sweave is dead, long live knitr!

knitr is Sweave reborn!


Rmarkdown, easy peasy lemon squeezy

A flavour of Markdown specifically for R.

Rmarkdown - render + knit to html

Three main sections: * YAML header * code chunks * markdown text


YAML (rhymes with “camel”)

YAML is human-friendly, cross language, Unicode based data serialization language designed around the common native data structures of agile programming languages.

This block allows you to fine-tune the output of your document. YAML metadata allows for: - TOC, tabbed sections, theme, highlight - allows for custom CSS - can evaluate R expressions, e.g. Sys.time()


Code chunks

three backticks{r chunk_name, options}
code!
three backticks

Common options: * include (FALSE) - prevents code and results from appearing * echo (FALSE) - include results (e.g. figures) but exclude the code * message (FALSE) - prevents messages generated by code from appearing * warning (FALSE) - as above but for warnings * fig.cap - add captions to graphics —

Rmarkdown

Easy within RStudio:
File -> New File -> R Markdown

Chunks:
* Infrastructure - environment (e.g. libraries), loading data, defining analysis parameters * Wrangling - code to transform data * Communication - e.g. data visualization, summary tables


Rmarkdown - best practices

Producing a book

“If you can type words, you can use bookdown”
@CivicAngela, RLadiesChicago


Bookdown

.pull-left[ ]

.pull-right[ ]


Going one step further

Why do we like workflowr?

Helps scientists organise their research in a way that promotes: - reproducibility - collaboration/sharing of results - effective project management

Combines literature programming and version control

Final result:
A website, containing time-stamped, versioned, and documented results

workflowr in a nutshell (cont.)

wflow_view()
wflow_build()
# makes the .html files from the .Rmd files
wflow_view()
wflow_status()
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"),
              "Publish the initial files for MyProject")
wflow_status()
wflow_use_github("yourGitHub_username", "MyProject")
# create the GitHub repository MyProject
wflow_git_push(dry_run = TRUE)
# ok?!
wflow_git_push()

# do some stuff

wflow_build()
# makes the .html files from the .Rmd files
wflow_view()
wflow_status()
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"),
              "Publish the initial files for MyProject")
wflow_status()
wflow_use_github("yourGitHubusername", "MyProject")
# create the GitHub repository MyProject
wflow_git_push(dry_run = TRUE)
# ok?!
wflow_git_push()

why workflowr

Once you are comfortable with the basics, got nuts with customisation.
e.g. https://github.com/timtrice/workflowr_skeleton


In addition

Sharing tidy, standardized, reproducible data sets for publications and collaborations can be challenging.
read https://ropensci.org/blog/2018/09/18/datapackager/
Caveats - see https://github.com/ropensci/DataPackageR - e.g. small size of data


R in production - it will work just fine …

(HT @_ColinFay)

Defining production:

“Software environments that are used and relied on by real users with real consequences if things go wrong” - @_ColinFay

“Production is anything that is run repeatedly and that the business relies on”

Cloud computing

There is no R in AWS

  1. use a bastion account
  2. pay attention to IAM
  3. log into EC2 instances via the AWS SSM Session Manager - no SSH key handling and fully audited.
  4. Programmable, repeatable infrastructure - infrastructure as code (IaC) - helps with resource management and tracking changes.

Running RStudio in AWS

Log into the AWS Management Console:

  1. EC2
  2. Configure the security groups for port 80 and 8787
  3. Install R
  4. Install RStudio
  5. Create a user for RStudio
  6. Point the public DNS (IPv4) at the correct port, and bingo!

R on AWS

Make use of the multicores:

install.packages(c(‘doMC’, ‘foreach’)) library(foreach) library(doMC) doMC::registerDoMC(cores = detectCores()) detectCores() foreach::getDoParWorkers() https://blog.sicara.com/speedup-r-rstudio-parallel-cloud-performance-aws-96d25c1b13e2


Resources